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based distance measure offers, to some extend, a remedy to this problem. Our scheme requires the 
eigen-decomposition of a covariance matrix and is as computationally efficient as standard £2 PCA. We 
>> demonstrate some of its favorable properties on robust subspace estimation. 
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Abstract 

We introduce the notion of Principal Component Analysis (PCA) of image gradient orientations. 
As image data is typically noisy, but noise is substantially different from Gaussian, traditional PCA 
of pixel intensities very often fails to estimate reliably the low-dimensional subspace of a given data 
population. We show that replacing intensities with gradient orientations and the £ 2 norm with a cosine- 
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NOTATION 


5, {.} 


set 


3? 


set of reals 


C 


set of complex numbers 


X 


scalar 


X 


column vector 


X 


matrix 


*-mxm 


m x m identity matrix 


a(k) 


k-th element of vector a 


N(S) 


cardinality set S 


11-11 


£ 2 norm 


II- \\f 


Frobenius norm 


Z H 


conjugate transpose of Z 


U[a, b] 


uniform distribution in [a, b] 


E[.] 


mean value operator 



x ~ U[a, b] x follows U[a, b] 

I. Introduction 

Provision for mechanisms capable of handling gross errors caused by possible arbitrarily large 
model deviations is a typical prerequisite in computer vision. Such deviations are not unusual 
in real-world applications where data contain artifacts due to occlusions, illumination changes, 
shadows, reflections or the appearance of new parts/objects. In most cases, such phenomena 
cannot be described by a mathematically well-defined generative model and are usually referred 
as outliers in learning and parameter estimation. 

In this paper, we propose a new avenue for Principal Component Analysis (PCA), perhaps 
the most classical tool for dimensionality reduction and feature extraction in pattern recognition. 
Standard PCA estimates the k— rank linear subspace of the given data population, which is 
optimal in a least-squares sense. Unfortunately £ 2 norm enjoys optimality properties only when 
noise is i.i.d. Gaussian; for data corrupted by outliers, the estimated subspace can be arbitrarily 
biased. 
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Robust formulations to PCA, such as robust co variance matrix estimators flU, ED, are com- 
putationally prohibitive for high dimensional data such as images. Robust approaches, well- 
suited for computer vision applications, include t\ [J3J , flU, robust energy function @ and 
weighted combination of nuclear norm and l\ minimization fl6], 0. t\ -based approaches can 
be computational efficient, however the gain in robustness is not always significant. The M- 
Estimation framework of [5J is robust but suitable only for relatively low dimensional data or 
off-line processing. Under weak assumptions 0, the convex optimization formulation of 0, 
perfectly recovers the low dimensional subspace of a data population corrupted by sparse 
arbitrarily large errors; nevertheless efficient reformulations of standard PCA can be orders of 
magnitude faster. 

In this paper we look at robust PCA from a completely different perspective. Our scheme 
does not operate on pixel intensities. In particular, we replace pixel intensities with gradient 
orientations. We define a notion of pixel-wise image dissimilarity by looking at the distribution 
of gradient orientation differences; intuitively this must be approximately uniform in [0, 2tt). We 
then assume that local orientation mismatches caused by outliers can be also well-described by 
a uniform distribution which, under some mild assumptions, is canceled out when we apply the 
cosine kernel. This last observation has been noticed in recently proposed schemes for image 
registration [8J. Following this line of research, we show that a cosine-based distance measure 
has a functional form which enables us to define an explicit mapping from the space of gradient 
orientations into a high-dimensional complex sphere where essentially linear complex PCA is 
performed. The mapping is one-to-one and therefore PCA-based reconstruction in the original 
input space is direct and requires no further optimization. Similarly to standard PCA, the basic 
computational module of our scheme requires the eigen-decomposition of a covariance matrix, 
while high dimensional data can be efficiently analyzed following the strategy suggested in Turk 
and Pentland's Eigenfaces 10. 

II. £ 2 -BASED PCA OF PIXEL INTENSITIES 

Let us denote by Xj G W the p— dimensional vector obtained by writing image Ij G !R miXm2 
in lexicographic ordering. We assume that we are given a population of n samples X = 
[x x | • • • |x n ] G W xn . Without loss of generality, we assume zero-mean data. PCA finds a set 
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of k < n orthonormal bases B fc = [bi| • • • |b fc ] E W xk by minimizing the error function 

e(B k ) = \\X-B k B T k X\\ 2 F . (ELI) 

The solution is given by the eigenvectors corresponding to the k largest eigenvalues obtained 
from the eigen-decomposition of the covariance matrix XX T . Finally, the reconstruction of X 
from the subspace spanned by the columns of B^ is given by X = B^Cfc, where C^ = B^X is 
the matrix which gathers the set of projection coefficients. 

For high dimensional data and Small Sample Size (SSS) problems (i.e. n <C p), an efficient 
implementation of PCA in 0(n 3 ) (instead of 0(p 3 )) was proposed in [9J. Rather than computing 
the eigen-analysis of XX T , we compute the eigen- analysis of X T X and make use of the 
following theorem 
Theorem I 
Define matrices A and B such that A = TT H and B = T H T with T E C mxr . Let U A and 

Ub be the eigenvectors corresponding to the non-zero eigenvalues A a and A# of A and B, 

_ i 
respectively. Then, Aa = A B and U4 = TUbA^ 2 . 

III. Random number generation from gradient orientation differences 

We formalize an observation for the distribution of gradient orientation differences which does 
not appear to be well-known in the scientific community [J. Consider a set of images { J,}. At 
each pixel location, we estimate the image gradients and the corresponding gradient orientation q 
We denote by {$«}, <&i E [0, 27r) miXm2 the set of orientation images and compute the orientation 
difference image 

A* 4i = *,-*,-. (DI. 1) 

We denote by 4 > i and A0j ■ = (j) i — <pj the p— dimensional vectors obtained by writing $, and 
A<&jj in lexicographic ordering and V = {1, . . . , p] the set of indices corresponding to the image 
support. We introduce the following definition. 
Definition Images Jj and 3j are pixel-wise dissimilar if VA; E V, Atfr^h) ~ U[0, 2ir). 

'This observation has been somewhat noticed in 1 10 1 with no further comments on its implications. 

2 More specifically, we compute $i = arctan Gi^/Gi^, where d, x — h x * h, Gi, y = h y * Ii and h x , h y are filters used to 
approximate the ideal differentiation operator along the image horizontal and vertical direction respectively. Possible choices for 
h x , h y include central difference estimators of various orders and discrete approximations to the first derivative of the Gaussian. 
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Not surprisingly, nature is replete with images exemplifying Definition 1. This, in turn, makes 
it possible to set up a naive image-based random generator. To confirm this, we used more than 
70, 000 pairs of image patches of resolution 200 x 200 randomly extracted from natural images 
IfTTTl . For each pair, we computed A0j- and formulated the following null hypothesis 

. H : VkeV A0 4i (fc)~I7[O,27r). 
which was tested using the Kolmogorov-Smirnov test lfT2ll . For a significance level equal to 
0.01, the null hypothesis was accepted for 94.05% of the image pairs with mean p-value equal 
to 0.2848. In a similar setting, we tested Matlab's random generator. The null hypothesis was 
accepted for 99.48% of the cases with mean p- value equal to 0.501. Fig. (U(a)-(b) show a typical 
pair of image patches considered in our experiment. Fig. [Q (c) and (d) plot the histograms of 
the gradient orientation differences and 40,000 samples drawn from Matlab's random number 
generator respectively. 




(a) 






<,■( 



(d) 



Fig. 1. (a)-(b) An image pair used in our experiment, (c) Image-based random number generator: histogram of 40,000 gradient 
orientation differences and (d) Histogram of 40,000 samples drawn from Matlab's random number generator. 
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IV. PCA OF GRADIENT ORIENTATIONS 

A. Cosine-based correlation of gradient orientations 

Given the set of our images {It}, we compute the corresponding set of orientation images 
{<frj} and measure image correlation using the cosine kernel 



sfc,^) ±J2 C0S l A< t>iM = cN ( v ) 



(IV.l) 



kev 



where c E [—1,1]. Notice that for highly spatially correlated images AfaJk) ~ and c 



Assume that there exists a subset V2 C V corresponding to the set of pixels corrupted by 
outliers. For V\ = V — V%, we have 



s 1 (d> i ,(j> j ) = J2 «»[A^(fc)] = cyNpi) 

fcePi 



(IV.2) 



where ci G [—1,1]. 

Not unreasonably, we assume that in P 2 > the images are pixel- wise dissimilar according to 
Definition 1 . For example, Fig. |2] (a)-(b) show an image pair where Vi is the part of the face 
occluded by the scarf and Fig. [2](c) plots the distribution of A<p in T^- Before proceeding for 





(a) 



Ch) 



(c) 



Fig. 2. (a)-(b) An image pair used in our experiments, (c) The distribution of A</> for the part of face occluded by the scarf. 



V2, we need the following theorem |fr2|. 
Theorem II 

Let u(.) be a random process and u(t) ~ U[0, 2n) then: 

• ^\J X cos u(t)dt] = for any non-empty interval X of 3£. 

• If u(.) is mean ergodic, then f x cosu(t)dt = 0. 
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We also make use of the following approximation 

J cos[A^.(t)]dt « J2 cos[A^.(A;)] (IV.3) 

where with some abuse of notation, A<^ • is defined in the continuous domain on the left hand 
side of (IIV.3I) . Completely analogously, the above theorem and approximation hold for the case 
of the sine kernel. 

Using the above results, for P 2 , we have 

S2 (0„ 4> 3 ) = E C0S l A M k )l - ° ( IV - 4 ) 

keV 2 
It is not difficult to verify that £ 2 -based correlation i.e. the inner product between two images will 
be zero if and only if the images have interchangeably black and white pixels. Our analysis and 
(IIV.4I) show that cosine-based correlation of gradient orientations allows for a much broader class 
of uncorrected images. Overall, unlike £ 2 -based correlation where the contribution of outliers can 
be arbitrarily large, s(.) measures correlation as s{(j> i , <J>a) = s 1 (4> i , <j>A + s 2 (0j, </>■) ~ c\N(Vi), 
i.e. the effect of outliers is approximately canceled out. 

B. The principal components of image gradient orientations 

To show how (II V. 1 1) can be used as a basis for PCA, we first define the distance 

v 
d\^ 4>.) = ]T{1 - cos[A^.(A;)]} (IV.5) 

fc=i 

We can write (IIV.5I) as follows 

d\4> % ^ 3 ) = IJ2{2- 2 008^) -<f>M} 

fc=i 

= i|| e M_ e ^||2 (IV .6) 

2 ll II v 

where e- 7 ^ = [e^ 1 ), . . . , e < Pi^] T . The last equality makes the basic computational module of 
our scheme apparent. We define the mapping from [0, 2tt) p onto a subset of complex sphere with 



radius y/N(V) 

Zi(<fc) = e?& (IV.7) 
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and apply linear complex PC A to the transformed data z, ; . 

Using the results of the previous subsection, we can remark the following 
Remark I If V = V x U V 2 with A0 y (fc) ~ U[0, 2tt), Vfc e V 2 , then Re[zf z 3 ] ~ c 1 N(V 1 ) 
Remark II If V 2 = V, then Re[zf z d ] ~ and Im[zf Zj] ~ 0. 

Further geometric intuition about the mapping z$ is provided by the chord between vectors Zj 
and z, 



crd( Zi , Zi ) = J(z< - Zj )H( Zi - Zj -) = J2d2(0 i , 0,) (IV.8) 



Using crd(.), the results of Remark 1 and 2 can be reformulated as crd(z i7 Zj) ~ y 2((1 — c\)N(Vi) + N(V 2 )) 
and crd(zj.Zj) ~ ^2N(V) respectively. 

Overall, Algorithm 1 summarizes the steps of our PCA of gradient orientations. 

Algorithm 1. Estimating the principal subspace 

Inputs: A set of n orientation images <I>j, i — 1, . . . , n of p pixels and the number k of principal 

components. 

Step 1. Obtain <p i by writing $j in lexicographic ordering. 

Step 2. Compute Zj = e^s form the matrix of the transformed data Z = [zi| • • • |z n ] G C pxn 

and compute the matrix T = Z H Z E TZ nxn . 

Step 3. Compute the eigen-decomposition of T = UAU tf and denote by U^ G C pxk and 

_ i 
A fc G 7£ fcxfc the /c— reduced set. Compute the principal subspace from B fc = ZXJ k A k 2 G C pxh . 

Step 4. Reconstruct using Z = B fc BfZ. 

Step 5. Go back to the orientation domain using $ = ZZ. 

Let us denote by Q = {1, . . . , n} the set of image indices and Qi any subset of Q. We can 
conclude the following 

Remark III If Q = Q 1 U Q 2 with zfz^ ~ Vz G Q 2 > Vj' G Q and z 7^ j, then, 3 eigenvector 
b ; of B„ such that b, ~ j^z-i- 

A special case of Remark III is the following 



r^~y 



Remark IV If Q = Q 2 , then ]^yA ~ I nxn and B n _ ^^ 

To exemplify Remark IV, we computed the eigen- spectrum of 100 natural image patches. 
In a similar setting, we computed the eigen- spectrum of samples drawn from Matlab's random 
number generator. Fig. [3] plots the two eigen-spectrums. 
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Fig. 3. The eigen-spectrum of natural images and the eigen-spectrum of samples drawn from Matlab's random number generator. 

Finally, notice that our framework also enables the direct embedding of new samples. Algorithm 
2 summarizes the procedure. 
Algorithm 2. Embedding of new samples 

Inputs: An orientation image © of p pixels and the principal subspace B fc of Algorithm 1. 
Step 1. Obtain 6 by writing © in lexicographic ordering. 



iO 



Step 2. Compute z = e J and reconstruct using z = B fc B£f z - 
Step 3. Go back to the orientation domain using = Zz. 



V. Results 



A. Face reconstruction 



The estimation of a low-dimensional subspace from a set of a highly-correlated images is a 
typical application of PCA IfTBI . As an example, we considered a set of 50 aligned face images 
of image resolution 192 x 168 taken from the Yale B face database lfl4ll . The images capture 
the face of the same subject under different lighting conditions. This setting usually induces 
cast shadows as well as other specularities. Face reconstruction from the principal subspace is 
a natural candidate for removing these artifacts. 

We initially considered two versions of this experiment. The first version used the set of 
original images. In the second version, 20% of the images was artificially occluded by a 70 x 70 
"Baboon" patch placed at random spatial locations. For both experiments, we reconstructed pixel 
intensities and gradient orientations with £ 2 PCA and PCA of gradient orientations respectively 
using the first 5 principal components. 
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Fig. |4] and Fig. [5] illustrate the quality of reconstruction for 2 examples of face images 
considered in our experiments. While PCA-based reconstruction of pixel intensities is visually 
appealing in the first experiment, Fig. |4] (g)-(h) clearly illustrate that, in the second experiment, 
the reconstruction suffers from artifacts. In contrary, Fig. \5\ (e)-(f) and (g)-(h) show that PCA- 
based reconstruction of gradient orientations not only reduces the effect of specularities but also 
reconstructs the gradient orientations corresponding to the "face" component only. 

This performance improvement becomes more evident by plotting the principal components for 
each method and experiment. Fig. [6] shows the 5 dominant Eigenfaces of £ 2 PCA. Observe that, 
in the second experiment, the last two Eigenfaces (Fig. [6] (i) and (j)) contain "Baboon" ghosts 
which largely affect the quality of reconstruction. In contrary, a simple visual inspection of Fig. 
U\ reveals that, in the second experiment, the principal subspace of gradient orientations (Fig. [7] 
(f)-(j)) is artifact-free which in turn makes dis-occlusion in the orientation domain feasible. 

Finally, to exemplify Remark 3, we considered a third version of our experiment where 20% 
of the images were replaced by the same 192 x 168 "Baboon" image. Fig. [8] (a)-(e) and (f)- 
(j) illustrate the principal subspace of pixel intensities and gradient orientations respectively. 
Clearly, we may observe that £ 2 PCA was unable to handle the extra-class outlier. In contrary, 
PCA of gradient orientations successfully separated the "face" from the "Baboon" subspace i.e. 
no eigenvectors were corrupted by the "Baboon" image. Note that the "face" principal subspace 
is not the same as the one obtained in versions 1 and 2. This is because only 80% of the images 
in our dataset was used in this experiment. 

VI. Conclusions 

We introduced a new concept: PCA of gradient orientations. Our framework is as simple 
as standard £ 2 PCA, yet much more powerful for efficient subspace-based data representation. 
Central to our analysis is the distribution of gradient orientation differences and the cosine 
kernel which provide us a consistent way to measure image dissimilarity. We showed how this 
dissimilarity measure can be naturally used to formulate a robust version of PCA. Extensions 
of our scheme span a wide range of theoretical topics and applications; from statistical machine 
learning and clustering to object recognition and tracking. 
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Fig. 4. PCA-based reconstruction of pixel intensities, (a)-(b) Original images used in version 1 of our experiment, (c)-(d) 
Corrupted images used in version 2 of our experiment, (e)-(f) Reconstruction of (a)-(b) with 5 principal components, (g)-(h) 
Reconstruction of (c)-(d) with 5 principal components. 
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Fig. 5. PCA-based reconstruction of gradient orientations, (a)-(b) Original orientations used in version 1 of our experiment. 
(c)-(d) Corrupted orientations used in version 2 of our experiment, (e)-(f) Reconstruction of (a)-(b) with 5 principal components. 
(g)-(h) Reconstruction of (c)-(d) with 5 principal components. 
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Fig. 6. The 5 principal components of pixel intensities for (a)-(e) version 1 and (f)-(j) version 2 of our experiment. 
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Fig. 7. The 5 principal components of gradient orientations for (a)-(e) version 1 and (f)-(j) version 2 of our experiment. 
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Fig. 8. (a)-(e) The 5 principal components of pixel intensities for version 3 of our experiment and (f)-(j) The 5 principal 
components of gradient orientations for the same experiment. 
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